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Abstract 

We consider the following problem of decentralized statistical inference: given i.i.d. samples from an 

unknown distribution, estimate an arbitrary quantile subject to limits on the number of bits exchanged. 

We analyze a standard fusion-based architecture, in which each of m sensors transmits a single bit to the 

, fusion center, which in turn is permitted to send some number k bits of feedback. Supposing that each of 

m sensors receives n observations, the optimal centralized protocol yields mean-squared error decaying 

' as 0{l/[nm]). We develop and analyze the performance of various decentralized protocols in comparison 

I to this centralized gold-standard. First, we describe a decentralized protocol based on fc = log(7Ti) bits of 

, feedback that is strongly consistent, and achieves the same asymptotic MSE as the centralized optimum. 

O ■ 

Second, we describe and analyze a decentralized protocol based on only a single bit (k ~ 1) of feedback. 
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For step sizes independent of m, it achieves an asymptotic MSE of order 0\V j (n^/r(i)\, whereas for step 



H ' sizes decaying as Xj^fra, it achieves the same C'(l/[nm]) decay in MSE as the centralized optimum. 

Our theoretical results are complemented by simulations, illustrating the tradeoffs between these different 
protocols. 

Keywords: Decentralized inference; communication constraints; distributed estimation; non-parametric 
estimation; quantiles; sensor networks; stochastic approximation. 

I. Introduction 

Whereas classical statistical inference is performed in a centralized manner, many modern scientific 
problems and engineering systems are inherently decentralized: data are distributed, and cannot be aggre- 

Portions of this work were presented at the International Symposium on Information Theory, Seattle, WA, July 2006. 
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gated due to various forms of communication constraints. An important example of such a decentralized 
system is a sensor network [6]: a set of spatially-distributed sensors collect data about the environmental 
state (e.g., temperature, humidity or light). Typically, these networks are based on ad hoc deployments, 
in which the individual sensors are low-cost, and must operate under very severe power constraints (e.g., 
limited battery life). In statistical terms, such communication constraints imply that the individual sensors 
cannot transmit the raw data; rather, they must compress or quantize the data — for instance, by reducing 
a continuous-valued observation to a single bit — and can transmit only this compressed representation 
back to the fusion center. 

By now, there is a rich literature in both information theory and statistical signal processing on problems 
of decentralized statistical inference. A number of researchers, dating back to the seminal paper of Tenney 
and Sandell [16], have studied the problem of hypothesis testing under communication-constraints; see 
the survey papers [17], [18], [4], [19], [5] and references therein for overviews of this line of work. The 
hypothesis-testing problem has also been studied in the information theory community, where the analysis 
is asymptotic and Shannon-theoretic in nature [1], [11]. A parallel line of work deals with problem of 
decentralized estimation. Work in signal processing typically formulates it as a quantizer design problem 
and considers finite sample behavior [2], [8]; in contrast, the information-theoretic approach is asymptotic 
in nature, based on rate-distortion theory [20], [10]. In much of the literature on decentralized statistical 
inference, it is assumed that the underlying distributions are known with a specified parametric form 
(e.g., Gaussian). More recent work has addressed non-parametric and data-driven formulations of these 
problems, in which the decision-maker is simply provided samples from the unknown distribution [14], 
[13], [9]. For instance, Nguyen et al. [14] established statistical consistency for non-parametric approaches 
to decentralized hypothesis testing based on reproducing kernel Hilbert spaces. Luo [13] analyzed a non- 
parametric formulation of decentralized mean estimation, in which a fixed but unknown parameter is 
corrupted by noise with bounded support but otherwise arbitrary distribution, and shown that decentralized 
approaches can achieve error rates that are order-optimal with respect to the centralized optimum. 

This paper addresses a different problem in decentralized non-parametric inference — namely, that of 
estimating an arbitrary quantile of an unknown distribution. Since there exists no unbiased estimator based 
on a single sample, we consider the performance of a network of m sensors, each of which collects a total 
of n observations in a sequential manner. Our analysis treats the standard fusion-based architecture, in 
which each of the m sensors transmits information to the fusion center via a communication-constrained 
channel. More concretely, at each observation round, each sensor is allowed to transmit a single bit to 
the fusion center, which in turn is permitted to send some number k bits of feedback. For a decentralized 
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protocol with k = log(m) bits of feedback, we prove that the algorithm achieves the order-optimal rate 
of the best centralized method (i.e., one with access to the full collection of raw data). We also consider 
a protocol that permits only a single bit of feedback, and establish that it achieves the same rate. This 
single -bit protocol is advantageous in that, with for a fixed target mean-squared error of the quantile 
estimate, it yields longer sensor lifetimes than either the centralized or full feedback protocols. 

The remainder of the paper is organized as follows. We begin in Section JI] with background on 
quantile estimation, and optimal rates in the centralized setting. We then describe two algorithms for 
solving the corresponding decentralized version, based on log(m) and 1 bit of feedback respectively, and 
provide an asymptotic characterization of their performance. These theoretical results are complemented 
with empirical simulations. Section |lll] contains the analysis of these two algorithms. In Section JVl we 
consider various extensions, including the case of feedback bits £ varying between the two extremes, and 
the effect of noise on the feedforward link. We conclude in Section |V] with a discussion. 

II. Problem Set-up and Decentralized Algorithms 

In this section, we begin with some background material on (centralized) quantile estimation, before 
introducing our decentralized algorithms, and stating our main theoretical results. 

A. Centralized Quantile Estimation 

We begin with classical background on the problem of quantile estimation (see Serfling [15] for further 
details). Given a real-valued random variable X, let F{x) : = F[X < x] be its cumulative distribution 
function (CDF), which is non-decreasing and right-continuous. For any < a < 1, the a^'^-quantile of X 
is defined as F^^{a) = 9{a) : = inf {x G M | F{x) > a}. Moreover, if F is continuous at a, then we 
have a = F{9{a)). As a particular example, for a = 0.5, the associated quantile is simply the median. 

Now suppose that for a fixed level a* G (0, 1), we wish to estimate the quantile 9* = 9{q*). Rather than 
impose a particular parameterized form on F, we work in a non-parametric setting, in which we assume 
only that the distribution function F is differentiable, so that X has the density function px{x) = F'{x) 
(w.r.t Lebesgue measure), and moreover that px{x) > for all x E M. In this setting, a standard estimator 
for 9* is the sample quantile Cat (a*) : = F^^{a*) where F/v denotes the empirical distribution function 
based on i.i.d. samples {Xi, . . . , Xj\[). Under the conditions given above, it can be shown [15] that 
^Ar(a*) is strongly consistent for 6* (i.e., G*), and moreover that asymptotic normality holds 
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Fig. 1. Sensor network for quantile estimation with m sensors. Each sensor is permitted to transmit a 1-bit 
message to the fusion center; in turn, the fusion center is permitted to broadcast k bits of feedback. 

so that the asymptotic MSB decreases as 0{l/N), where N is the total number of samples. Although 
this rate is optimal, the precise form of the asymptotic variance ([I]) need not be in general; see 
Zielinski [21] for in-depth discussion of the optimal asymptotic variances that can be obtained with 
variants of this basic estimator under different conditions. 

B. Distributed Quantile Estimation 

We consider the standard network architecture illustrated in Figure [T] There are m sensors, each of 
which has a dedicated two-way link to a fusion center. We assume that each sensor i G {1, . . . , m} collects 
independent samples X{i) of the random variable X G R with distribution function F{9) : = F[X < 6]. 
We consider a sequential version of the quantile estimation problem, in which sensor i receives measure- 
ments Xn{i) at time steps n = 0, 1, 2, . . ., and the fusion center forms an estimate 9n of the quantile. The 
key condition — giving rise to the decentralized nature of the problem — is that communication between 
each sensor and the central processor is constrained, so that the sensor cannot simply relay its measurement 
X{i) to the central location, but rather must perform local computation, and then transmit a summary 
statistic to the fusion center. More concretely, we impose the following restrictions on the protocol. First, 
at each time step n = 0, 1, 2, . . ., each sensor i = 1, . . . ,m can transmit a single bit Yn{i) to the fusion 
center. Second, the fusion center can broadcast k bits back to the sensor nodes at each time step. We 
analyze two distinct protocols, depending on whether k = log(m) or k = I. 

C. Protocol specification 

For each protocol, all sensors are initialized with some fixed Oq. The algorithms are specified in terms 
of a constant K > and step sizes > that satisfy the conditions 
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The first condition ensures infinite travel (i.e., that the sequence 9n can reach 9* from any starting 
condition), whereas the second condition (which implies that e„ — > 0) is required for variance reduction. 
A standard choice satisfying these conditions — and the one that we assume herein — is e„ = 1/n. With this 
set-up, the log(?7i)-bit scheme consists of the steps given in Table H Although the most straightforward 



Algorithm: Decentralized quantile estimation with log(m)-bit feedbaclt 

Given A" > and variable step sizes e„ > 0: 

(a) Local decision: each sensor computes the binary decision 

= y„+i(i;6l„) := < 9„), (3) 

and transmits it to the fusion center. 

(b) Parameter update: the fusion center updates its current estimate On+i of the quantile parameter as follows: 

= e„ + .„A-(^a*-^^^i^Ji±i^!l) (4) 

(c) Feedback: the fusion broadcasts the m received bits {y„+i(l), . . . , Yn+i{m)} back to the sensors. Each sensor can 
then compute the updated parameter 6n+i. 



TABLE I: Description of the log(7n)-bf algorithm. 



feedback protocol is to broadcast back the m received bits {y„+i(l), . . . , Yn+i{m)}, as described in step 
(c), in fact it suffices to transmit only the log(m) bits required to perfectly describe the binomial random 
variable X]™^^ y„+i(i) in order to update On- In either case, after the feedback step, each sensor knows 
the value of the sum X^I^i ^n+i(^). which (in conjunction with knowledge of m, a* and e„) allow it to 
compute the updated parameter 9n+i- Finally, knowledge of 9n+i allows each sensor to then compute 
the local decision ([3]l in the following round. 

The 1-bit feedback scheme detailed in Table HIl is similar, except that it requires broadcasting only a 
single bit (Zn+i), and involves an extra step size parameter Km, which is specified in the statement of 
Theorem [21 After the feedback step of the 1-bf algorithm, each sensor has knowledge of the aggregate 
decision which (in conjunction with e„ and the constant /?) allow it to compute the updated 

parameter 9n+i- Knowledge of this parameter suffices to compute the local decision ((S). 

D. Convergence results 

We now state our main results on the convergence behavior of these two distributed protocols. In all 
cases, we assume the step size choice = 1/n. Given fixed a* G (0,1), we use 9* to denote the 
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Algorithm: Decentralized quantile estimation with 1-bit feedbaclt 

Given Km > (possibly depending on number of sensors m) and variable step sizes e„ > 0: 

(a) Local decision: each sensor computes the binary decision 

y„+i(i) = i(x„+i(i) < e„) (5) 

and transmits it to the fusion center. 

(b) Aggregate decision and parameter update: The fusion center computes the aggregate decision 

= lf^Ik2^<a-V (6) 



and uses it update the parameter according to 

6n + l ~ On + e-nKm {Zn + 1 — P) (7) 

where the constant /3 is chosen as 

[ma* J 

P = T. r:i(a*)'(l-a*)'"-\ (8) 



[ma J / \ 



(c) Feedback: The fusion center broadcasts the aggregate decision Zn+i back to the sensor nodes (one bit of feedback). 
Each sensor can then compute the updated parameter 6'„+i. 

TABLE II: Description of the 1-bf algorithm. 



a*-level quantile (i.e., such that ¥{X < 9*) = a*); note that our assumption of a strictly positive density 
guarantees that 9* is unique. 

Theorem 1 (m-bit feedback): For any a* G (0, 1), consider a random sequence {^n} generated by the 
m-bit feedback protocol. Then 

(a) For all initial conditions 9o, the sequence 9n converges almost surely to the a* -quantile 9* . 

(b) Moreover, if the constant K is chosen to satisfy px{S*) K > \, then 

^ ' \ [2Kpx{9*)-l] mj' 

so that the asymptotic MSE is O(^). 

Remarks: After n steps of this decentralized protocol, a total of A'^ = nm observations have been made, 
so that our discussion in Section III-AI dictates (see equation ([T])) that the optimal asymptotic MSE is 
O(^). Interestingly, then, the log (m) -bit feedback decentralized protocol is order-optimal with respect 
to the centrahzed gold standard. 
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Before stating the analogous result for the 1-bit feedback protocol, we begin by introducing some 
useful notation. First, we define for any fixed ^ G M the random variable 



^ m m 



i=l 



i=l 



Note that for each fixed 9, the distribution of Y{9) is binomial with parameters m and F{9). It is 
convenient to define the function 



[my\ 



(10) 



Gmir,y) := E h)rHl-rr-\ 
with domain (r, y) G [0, 1] x [0, 1]. With this notation, we have 

nYi9)<y) = Gm{F{9),y). 
Again, we fix an arbitrary a* G (0, 1) and let 9* be the associated ci;*-quantile satisfying P(X < 9*) = a*. 



Theorem 2 (1-bit feedback): Given a random sequence {9^] generated by the 1-bit feedback protocol, 
we have 

(a) For any initial condition, the sequence 9n — ^ 9*. 

(b) Suppose that the step size is chosen such that > ^2p°{e*),^ '' equivalently such that 



dGr, 



1 



then 



n 



^) AAA 0, 



KlG„^{a*,9*)[l-Gm{a*,9*)] 



(11) 



(12) 



^ 2^mm - 1 ^ 

(c) If we choose a constant step size Km = K, then as n — > oo, the asymptotic variance behaves as 



8Kpx{9*)y/m - 4y^27rQ*(l - a* 
so that the asymptotic MSE is O (^;;-^)- 



(13) 



(d) If we choose a decaying step size Km = -j=, then 



1 

m 



KV2™*(1 - a*) 



8Kpx{9*) - Ay/27ra*{l - a* 
so that the asymptotic MSE is O (^). 



(14) 
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E. Comparative Analysis 

It is interesting to compare the performance of each proposed decentralized algorithm to the centralized 
performance. Considering first the log(m)-bf scheme, suppose that we set K = l/px{0*). Using the 
formula ^ from Theorem [TJ we obtain that the asymptotic variance of the m-bf scheme with this choice 
of K is given by " l^afN ^ — , thus matching the asymptotics of the centralized quantile estimator ([T|l. In 
fact, it can be shown that the choice K = \/px{0*) is optimal in the sense of minimizing the asymptotic 
variance for our scheme, when K is constrained by the stability criterion in Theorem [T] In practice, 
however, the value px{(^*) is typically not known, so that it may not be possible to implement exactly 
this scheme. An interesting question is whether an adaptive scheme could be used to estimate px{(^*) 
(and hence the optimal K simultaneously), thereby achieving this optimal asymptotic variance. We leave 
this question open as an interesting direction for future work. 

Turning now to the algorithm 1-bf, if we make the substitution K = K/y^27ra*{l — a*) in equa- 
tion ([141 ). then we obtain the asymptotic variance 

TT K'^a*(l-a*) 1 

= (15) 

2 [2Kpx{0*) -l] m' 

Since the stability criterion is the same as that for m-bf, the optimal choice is K = l/px{0*). Conse- 
quently, while the (l/[mn]) rate is the same as both the centralized and decentralized m-bf protocols, 
the pre-factor for the 1-bf algorithm is ^ ~ 1.57 times larger than the optimized m-bf scheme. However, 
despite this loss in the pre-factor, the 1-bf protocol has substantial advantages over the m-bf; in particular, 
the network lifetime scales as 0(m) compared to C'(m/log(m)) for the log(m)-bf scheme. 

F. Simulation example 

We now provide some simulation results in order to illustrate the two decentralized protocols, and the 
agreement between theory and practice. In particular, we consider the quantile estimation problem when 
the underlying distribution (which, of course, is unknown to the algorithm) is uniform on [0, 1]. In this 
case, we have px{x) = 1 uniformly for all x G [0, 1], so that taking the constant K = I ensures that the 
stability conditions in both Theorem [T] and |2] are satisfied. We simulate the behavior of both algorithms 
for a* = 0.3 over a range of choices for the network size m. Figure l^a) illustrates several sample paths 
of m-bit feedback protocol, showing the convergence to the correct 9*. 

For comparison to our theory, we measure the empirical variance by averaging the error = \/n{Qn — 0* ) 
over L = 20 runs. The normalization by ^/n is used to isolate the effect of increasing m, the number of 
nodes in the network. We estimate the variance by running algorithm for n = 2000 steps, and computing 
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Number of Iterations (N) M (number of sensors) m (Number of sansors) 

(a) (b) (c) 

Fig. 2. Convergence of 0„ to 0* with 7ti = 11 nodes, and quantile level a* = 0.3. (b) Log-log plots of the 
variance against m for both algorithms (log(m)-bf and 1-bf) with constant step sizes, and comparison to 
the theoretically-predicted rate (solid straight lines), (c) Log-log plots of log(TO)-bf with constant step size 
versus 1-bf algorithm with decaying step size. 



the empirical variance of e„ for time steps n = 1800 through to n = 2000. Figure Ob) shows these 
empirically computed variances, and a comparison to the theoretical predictions of Theorems [J and |2] 
for constant step size; note the excellent agreement between theory and practice. Panel (c) shows the 
comparison between the log(m)-bf algorithm, and the 1-bf algorithm with decaying Xj^/m step size. 
Here the asymptotic MSB of both algorithms decays like 1/m for logm up to roughly 500; after this 
point, our fixed choice of n is insufficient to reveal the asymptotic behavior. 

in. Analysis 

In this section, we turn to the proofs of Theorem [T] and [2l which exploit results from the stochastic 
approximation literature [12], [3]. In particular, both types of parameter updates (01) and ([7]) can be written 
in the general form 

+ enH{en,Yn+i), (16) 

where l^n+i = (^n+iCl), • • • Y'n+i{rn)). Note that the step size choice e„ = 1/n satisfies the conditions 
in equation Moreover, the sequence {6n,Yn+i) is Markov, since 0„ and Yn+i depend on the past 
only via 9n-i and y„. We begin by stating some known results from stochastic approximation, applicable 
to such Markov sequences, that will be used in our analysis. 

For each fixed ^ G R, let • ) denote the distribution of Y conditioned on 9. A key quantity in the 
analysis of stochastic approximation algorithms is the averaged function 

h{d) := [ H{e,y)fig{dy) = E[H{e,Y)\e]. (17) 
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We assume (as is true for our cases) that this expectation exists. Now the differential equation method 
dictates that under suitable conditions, the asymptotic behavior of the update (fT6l) is determined essentially 
by the behavior of the ODE = h{9{t)). 

Almost sure convergence: Suppose that the following attractiveness condition 



h{e) [9 -9*] < for all 9^9* 



(18) 



is satisfied. If, in addition, the variance R{9) : = Yai[H{9; Y) \ 9] is bounded, then we are are guaranteed 
that 9n 9* (see §5.1 in Benveniste et al. [3]). 

Asymptotic normality: In our updates, the random variables y„ take the form y„ = g{Xn,9n) where 
the Xn are i.i.d. random variables. Suppose that the following stability condition is satisfied: 



Then we have 



7(r) :: 



n {9n - 9*) 



dh 



{9* 



> 



1 







R{9*) 



2j{9*) - 1) 



(19) 



(20) 



See §3.1.2 in Benveniste et al. [3] for further details. 



A. Proof of Theorem [7] 

(a) The m-bit feedback algorithm is a special case of the general update ([T6l ). with Cn = ^ and 
H{9n, Yn+i) = -f^ ["* - ^ Si^i Yn+i{i; On)] ■ Computing the averaged function ([l7]l, we have 



h{9) 



KE 



i=l 



= Kia*-F{9n)), 

where F{9n) = P(X < 0„). We then observe that 9* satisfies the attractiveness condition ([TSl l. since 

[9 - 0*] h{9n) = K[9- [a* - F{9n)] < 

for all 9 9*, hy the monotonicity of the cumulative distribution function. Finally, we compute the 
conditional variance of H as follows: 

Ei^l^n+l(i) I . 



R{0n) 



a 



m 



— F[9r.)[l-F{9n)] < — , 

m 4m 



(21) 



using the fact that H is, & sum of m Bernoulli variables that are conditionally i.i.d. (given 6'„). Thus, we 
can conclude that 9n 9* almost surely. 
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(b) Note that 7(6'*) = -^(0*) = Kpx{0*) > \, so that the stabiUty condition ([T9ll holds. Applying the 
asymptotic normality result (l20l ) with the variance R{e*) = ^a*(l-a*) (computed from equation dHJ) 
yields the claim. 

■ 

B. Proof of Theorem |2] 

This argument involves additional analysis, due to the aggregate decision ([6]) taken by the fusion 
center. Since the decision Zn+i is a Bernoulli random variable; we begin by computing its parameter. 
Each transmitted bit y„+i(i) is Ber(F(6'„)), where we recall the notation F{6) : = ¥{X < 6). Using the 
definition (ITOl ). we have the equivalences 

P(Z„+i = l) = G^(F(0„),a*) (22a) 

(5 = G^(a*,a*) = G„(F(r),a*). (22b) 

We start with the following result: 

Lemma 1: For fixed x G [0,1], the function /(r) := Gm{r,x) is non-negative, differentiable and 
mono tonic ally decreasing. 

Proof: Non-negativity and differentiability are immediate. To establish monotonicity, note that /(r) = 
^iYl^i — xm), where the Yi are i.i.d. Ber(r) variates. Consider a second Ber(r') sequence ¥( with 
r' > r. Then the sum ^ Yl stochastically dominates X]™ ^ Yi, so that /(r) < f{r') as required. 

■ 

To establish almost sure convergence, we use a similar approach as in the previous theorem. Using 
the equivalences (l22l) . we compute the function h as follows 

h{e) = KmnZn+i-P\e] 

= [G„,{F{9),a*) - G„(F(r), a*)] . 

Next we establish the attractiveness condition ([TSl l. In particular, for any 6 such that F{6) / F{6*), we 
calculate that h{6) [6 - 9*] is given by 

K^[G.^{F{en),a*)-G^{F{d*),a*)] [On -9*] < 0, 

where the inequality follows from the fact that Gm{r,x) is monotonically decreasing in r for each fixed 
X G [0, 1] (using Lemma [T]), and that the function F is monotonically increasing. Finally, computing the 
variance R{9) : = Var [H(9, Y) \ 9], we have 

R{9) = KlGm{F{9),a*) [1 - G„(F(^), a*)] < ^ 

February 1, 2008 DRAFT 
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since (conditioned on 6), the decision Zn+i is Bernoulli with parameter Gm{F{6);a*). Thus, we can 
conclude that 9n 0* almost surely. 

(b) To show asymptotic normality, we need to verify the stability condition. By chain rule, we have 
^{0*) = K„^^{r, a*) Px{0). From Lemmaffl we have ^{F{e), a*) < 0, so that the stability 

r=F{9) 

condition holds as long as 7m (^*) > ^ (where 7^ is defined in the statement). Thus, asymptotic normality 
holds. 

In order to compute the asymptotic variance, we need to investigate the behavior of R{6*) and 7(0*) as 
m +00. First examining the central limit theorem guarantees that y) ^ (V^ a'^(i'^ 

Consequently, we have 

Rie*) = KlGm{F{9*),a*) [1 - Gm(F(r ), a*)] ^ 



•(l-Q 



4 

We now turn to the behavior of 7(6'*). We first prove a lemma to characterize the asymptotic behavior 

of Grn{r, a*): 

Lemma 2: (a) The partial derivative of Gm{r,x) with respect to r is given by: 

dGm{r, x) K[XI{X < xm)] - K[X]E[I{X < xm)] 



dr r(l — r) 

where X is binomial with parameters (m, x), and mean ¥,[X] = xm. 
(b) Moreover, as m — > +00, we have 

dGm{r,a*), I rn~ 



(23) 



\r=F{e') Y 27ra*(l - a*)' 

Proof: (a) Computing the partial derivative, we have 



dGm{r,x) 
dr 



\rna'\ . . 

-^5^ I ^ U^-mry{l-rr-' 

1 1 mx I / \ I mx I / 

1 / V-^ / m \ . / m , . 

r*(l - r)™-* -mr ^ | r*(l - r)' 



r(l — r) 
1 

r(l — r) 



(E[XI(X < mx)] - E[X]E[I(X < mx)]) 



as claimed. 



(b) We derive this limiting behavior by applying classical asymptotics to the form of ^ given in 
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part (a). Defining Z„ 



the central limit theorem yields that: 

4 Z~iV(0,a) 



a 



a* (1 - a*) 



Moreover, in this binomial case, we actually have E[|Zm|] 1E[|Z|] = y 
First, since E[X] = a*m and E[I(X < a*m)] ^ ^ by the CLT, we have 

E[X] E[I(X < a*m)] ^ - — . 



Let us now re-write the first term in the representation (l23l ) of '^'^"'J^'" as 

E[XI(X < a*m)] = Q*?nE[I(X < a*m)] + v^E[Z,„I(Z„ < 0)] 



since E[I(X < a*m)] ^ 1/2 and 

E[Z™I(Z™ < 0)] ^ E[ZI(Z < 0)] = ^E[|Z|] = 
Putting together the limits (l25l ) and (l26l ). we conclude that ^'^"g^'" ^ |r=Q-» converges to 




1 



a*(l — a*] 



am 



a* (1 — a*) 



am 



m 



27rQ*(l - a*) 



(24) 



(25) 



(26) 



2 'V 271 \ 2 
as claimed. ■ 
Returning now to the proof of the theorem, we use Lemma |2] and put the pieces together to obtain 



that 



2K„ 



converges to 



^.|px(e*)-i 

i/27ra*(l-a*) 



1 

m 



K'^^2-Ka*{l 



a^ 



8Kpx{e*) - 4Y^27ra*(l 



a^ 



with K > ^^2pl*(e*) ~~^ ^^'^ stability, thus completing the proof of the theorem. 



IV. Some extensions 

In this section, we consider some extensions of the algorithms and analysis from the preceding sections, 
including variations in the number of feedback bits, and the effects of noise. 
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A. Different levels of feedback 

We first consider the generalization of the preceding analysis to the case when the fusion center some 
number of bits between 1 and m. The basic idea is to apply a quantizer with 2£ levels, corresponding 
to log2(2^) bits, on the update of the stochastic gradient algorithm. Note that the extremes £ = 1 and 
£ = 2™~^ correspond to the previously studied protocols. Given 2£ levels, we partition the real line as 



— oo 



s-e < s-e+i < ... < se-i < Si = +oo, 



(27) 



where the remaining breakpoints {s^} are to be specified. With this partition fixed, we define a quanti- 
zation function Qi 



if X e (sfe, Sfc+i] for A; = -£,...,£- 1, 



(28) 



where the 2£ quantized values (r_£, . . . ,r£_i) are to be chosen. In the setting of the algorithm to be 
proposed, the quantizer is applied to binomial random variables X with parameters {m,r). Recall the 
function Grn{r,x), as defined in equation ( fTOl ). corresponding to the probability F[X < mx]. Let us 
define a new function Gm,e, corresponding to the expected value of the quantizer when applied to such 
a binomial variate, as follows 

i-i 

Gm,(.{r,x) := rk{Gm{r,x - Sk) - Gm{r,x - Sk+i)} . (29) 

k=~e 

With these definitions, the general log2(2£) feedback algorithm takes the form shown in Table Hill 

In order to understand the choice of the offset parameter f3 defined in equation ( [33] |. we compute the 
expected value of the quantizer function, when On = 9*, as follows 

E.=l^n+lW 



E 



Qi 



a 



m 



Y^Tkl 



k=-£ 

e-1 



[a 



m 



Sk) 



= ^ rfc [G^(F(r ), a* - Sk) - Gm{F{e*),a* - Sk+i)] 
k=-l 

The following result, analogous to Theorem |2l characterizes the behavior of this general protocol: 

Theorem 3 (General feedback scheme): Given a random sequence {6'„} generated by the general log2(2^)- 

bit feedback protocol, there exist choices of partition {s^} and quantization levels {rj.} such that: 
(a) For any initial condition, the sequence On 0*. 
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Algorithm: Decentralized quantile estimation with log2(2^)-bits feedbacli 

Given K,n > (possibly depending on number of sensors m) and variable step sizes e„ > 0: 

(a) Local decision: each sensor computes the binary decision 

y„+i(i) = < e„) (30) 

and transmits it to the fusion center. 

(b) Aggregate decision and parameter update: The fusion center computes the quantized aggregate decision variable 



Zn + 1 — Qt 

and uses it update the parameter according to 



(31) 



On + l ~ dn + tnKm {Zn + 1 — P) (32) 

where the constant /3 is chosen as 

P ■■= GmAF{e'),a). (33) 

(c) Feedback: The fusion center broadcasts the aggregate quantized decision Z^+i back to the sensor nodes, using its 
log2(2^) bits of feedback. The sensor nodes can then compute the updated parameter Sn+i- 

TABLE III: Description of the general algorithm, with log2(2£) bits of feedback. 



(b) There exists a choice of decaying step size (i.e., Km x --^) such that the asymptotic variance of 
the protocol is given by '^^'^^^'^ , where the constant has the form 

i^ia ,Qi) ■■= 27r — -2—, (34) 

with 

AGra{sk:Sk+i) = Gm{F{e*) , a* - Sk) - Grn{F{9*) , a* - Sk+i) , and (35a) 

msl \ ( msl^^ ^ 



Am{s„s,^,) = exp ^- 2^.(1 :^.) J -exp [- ^^^il'- a*) ) ' ^^^^^ 
We provide a formal proof of Theorem |3] in the Appendix. Figure |3la) illustrates how the constant factor 
K, as defined in equation (l34l ) decreases as the number of levels i in an uniform quantizer is increased. 

In order to provide comparison with results from the previous section, let us see how the two extreme 
cases (1 bit and ?n feedback) can be obtained as special case. For the 1-bit case, the quantizer has £ = I 
levels with breakpoints s_i = — oo, sq = 0, si = +00, and quantizer outputs r„i = and ri = 1. By 
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making the appropriate substitutions, we obtain: 

K[a ,Qi) = 2tt — , p = Gm,e[F[d ),a ) , 

AG.miso,si) = Gm,t{F{e*),a*) and A^(so,si)) = 1- 
By applying the central limit theorem, we conclude that 

AG^(so,5i) = G^,,(F(r),a*)(l - «*)) ^ 1/4, 

as established earlier. Thus k(q*, Qi)— >7r/2asm^c>o, recovering the result of Theorem |2l Similarly, 
the results for m-bf can be recovered by setting the parameters 

rk_i = a* — — — for k = 0, ...,m, and 
m 

Si = n. (36) 




(a) (b) 

Fig. 3. (a) Plots of the asymptotic variance K{a* , Qe) defined in equation ( l34l i versus the number of levels 
^ in a uniform quantizer, corresponding to log2(2^) bits of feedback, for a sensor network with m = 4000 
nodes. The plots show the asymptotic variance rescaled by the centralized gold standard, so that it starts at 
7r/2 for £ — 2, and decreases towards 1 as £ is increased towards to/2, (b) Plots of the asymptotic variances 

Vm{£) and Vi{e) defined in equation ( [39l ) as the feedforward noise parameter e is increased from towards 

1 

2' 



B. Extensions to noisy links 

We now briefly consider the effect of communication noise on our algorithms. There are two types of 
noise to consider: (a) feedforward, meaning noise in the link from sensor node to fusion center, and (b) 
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feedback, meaning noise in the feedback link from fusion center to the sensor nodes. Here we show that 
feedforward noise can be handled in a relatively straightforward way in our algorithmic framework. On the 
other hand, feedback noise requires a different analysis, as the different sensors may loose synchronicity 
in their updating procedure. Although a thorough analysis of such asynchronicity is an interesting topic 
for future research, we note that assuming noiseless feedback is not unreasonable, since the fusion center 
typically has greater transmission power. 

Focusing then on the case of feedforward noise, let us assume that the link between each sensor and 
the fusion center acts as a binary symmetric channel (BSC) with probability e G [0, |). More precisely, 
if a bit X G {0, 1} is transmitted, then the received bit y has the (conditional) distribution 



P(y I x) 



1 — e if X = y 

(37) 

e if X 7^ y. 



With this bit-flipping noise, the updates (both equation Q and ([7])) need to be modified so as to correct 
for the bias introduced by the channel noise. If a* denotes the desired quantile, then in the presence of 
BSC(e) noise, both algorithms should be run with the modified parameter 

5(e) := (l-2e)a* + e. (38) 

Note that a(e) ranges between a* (for the noiseless case e = 0), to a quantity arbitrarily close to i, as 
the channel approaches the extreme of pure noise (e = i). The following lemma shows that for all e < i, 
this adjustment (1381 ) suffices to correct the algorithm. Moreover, it specifies how the resulting asymptotic 
variance depends on the noise parameter: 

Proposition 1: Suppose that each of the m feedforward links from sensor to fusion center are modeled 
as i.i.d. BSC channels with probability e G [0, ^). Then the m-bf or 1-bf algorithms, with the adjusted 
5(e), are strongly consistent in computing the a*-quantile. Moreover, with appropriate step size choices, 
their asymptotic MSEs scale as l/(mn) with respective pre-factors given by 
Vie) - K-^{e){l-a{^)) 



Vi{e) 



i^V27r5(e)(l -5(e)) 



(39b) 



m{l - 2e)px{0*) - 4Y/27r5(e)(l - 5(e)) 
In both cases, the asymptotic MSB is minimal for e = 0. 

Proof: If sensor node i transmits a bit Yn+i{i) at round n + 1, then the fusion center receives the random 
variable 

y„+l(i) = Yn+l{i)®Wn+l, 
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where VF^+i is Bernoulli with parameter e, and © denotes addition modulo two. Since Wn+i is inde- 
pendent of the transmitted bit (which is Bernoulli with parameter F{6n)), the received value Yn+i{i) is 
also Bernoulli, with parameter 

e * F{en) = e (1 - FiOn)) + (1 - e) = e + (1 - 2e) F(e„). (40) 

Consequently, if we set 5(e) according to equation (1381 ). both algorithms will have their unique fixed point 
when F{6) = a* , so will compute the a*-quantile of X. The claimed form of the asymptotic variances 
follows from by performing calculations analogous to the proofs of Theorems [1] and |2] In particular, the 
partial derivative with respect to 9 now has a multiplicative factor (1 — 2e), arising from equation (l40l ) 
and the chain rule. To establish that the asymptotic variance is minimized at e = 0, it suffices to note 
that the derivative of the MSE with respect to e is positive, so that it is an increasing function of e. 

■ 

Of course, both the algorithms will fail, as would be expected, if e = 1/2 corresponding to pure 
noise. However, as summarized in Proposition [T] as long as e < i, feedforward noise does not affect the 
asymptotic rate itself, but rather only the pre-factor in front of the 1 / (mn) rate. Figure [S^b) shows how 
the asymptotic variances Vm{^) and Vi{e) behave as e is increased towards e = ^■ 

V. Discussion 

In this paper, we have proposed and analyzed different approaches to the problem of decentralized 
quantile estimation under communication constraints. Our analysis focused on the fusion-centric archi- 
tecture, in which a set of m sensor nodes each collect an observation at each time step. After n rounds 
of this process, the centralized oracle would be able to estimate an arbitrary quantile with mean-squared 
error of the order 0{\/ {mn)). In the decentralized formulation considered here, each sensor node is 
allowed to transmit only a single bit of information to the fusion center. We then considered a range of 
decentralized algorithms, indexed by the number of feedback bits that the fusion center is allowed to 
transmit back to the sensor nodes. In the simplest case, we showed that an log m-bit feedback algorithm 
achieves the same asymptotic variance 0{1/ {mn)) as the centralized estimator. More interestingly, we 
also showed that that a 1-bit feedback scheme, with suitably designed step sizes, can also achieve the 
same asymptotic variance as the centralized oracle. We also showed that using intermediate amounts of 
feedback (between 1 and m bits) does not alter the scaling behavior, but improves the constant. Finally, 
we showed how our algorithm can be adapted to the case of noise in the feedforward links from sensor 
nodes to fusion center, and the resulting effect on the asymptotic variance. 
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Our analysis in the current paper has focused only on the fusion center architecture illustrated in 
Figure [T] A natural generalization is to consider a more general communication network, specified by an 
undirected graph on the sensor nodes. One possible formulation is to allow only pairs of sensor nodes 
connected by an edge in this communication graph to exchange a bit of information at each round. In 
this framework, the problem considered in this paper effectively corresponds to the complete graph, in 
which every node communicates with every other node at each round. This more general formulation 
raises interesting questions as to the effect of graph topology on the achievable rates and asymptotic 
variances. 
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Appendix 

Proof of Theorem IS 

We proceed in an analogous manner to the proof of Theorem [T] 

Lemma 3: For fixed x € [0, 1], the function G.m/{r,x) is non-negative, differentiable and monotoni- 
cally decreasing. 

Proof: First notice that by definition: 



Qe 


X' 




x 






m 





where X is a Bin{r,m) random variable. Note that if X' ^ Bin{r' ,m), with r' > r, then cer- 
tainly P {X' < n) < F {X < n) — meaning that X' stochastically dominates X. For any constant x, 
~ m" — — ~ m — '^)- Furthermore, by the quantizer is, by definition, a monotonically non- 
decreasing function. Consequently, a standard result on stochastic domination [7, §4.12] implies that 
Gm,£{r,x) > Gm,e{r' ,x). Differentiability follows from the definition of the function. 

■ 

The finiteness of the variance of the quantization step is clear by construction; more specifically, a 
crude upper bound is r|. Thus, analogous to the previous theorems. Lemma [3] is used to establish almost 
sure convergence. 
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Now, some straightforward algebra using the results of Lemma |2] shows that the partial derivative 

dG,„,i(r,x) 



dr 



IS 

e-1 



r(l - r) 



k=-l 



X l[x — Sk+i < — < X — Sk 
m 



E[X] 



X 

X - Sfc+l < <X - Sk 

m 



, (42) 



This will be used next. To compute the asymptotic variance, we again exploit asymptotic normality 
(see equation (l24l )) as before: 

X — a*m 



E[XIim{a* - Sk+i) <X< m(Q* - Sk))] 



E 



XI 



msk+i < 



m 



< -y/mSk 



mii 



K [{Z + a*\/rn)I [—^/msk+l < Z < —^/msk)] 
\/mK [ZI (— "v/msfc+i < Z < —\/rnsk)] + S 



msk+i exp , 

m I z ^ dz + 5 

. Isu V27ra 

S := E[X]P{m{x- Sk+i)<X <m{x- Sk)) 
Now make the definition, which corresponds to solving the integral above: 



/^rn{Sk,Sk+l) = exp 



msl 



ms 



2a* (1 - a' 

Thus, plugging into Equation |42l noticing that S cancels: 



exp 



fc+i 



2a* (1 - a*) 



dGm,e{r,a*] 
dr 



\r=F{e*) 



m 



27rQ*(l - a* 



y rk^misk,Sk+i) 



k=-i 



A side note is that if one chooses so = 0, we are guaranteed that at least one Am{sk, Sk+i) does 
not go to zero in a fixed quantizer (i.e. a quantizer where the levels Sk do not depend on m). But the 
correction factor expression, and as a matter of fact, the optimum quantization of Gaussian, suggests that 
the levels Sk scale as 1/ ^/m. In this case, the factor is a constant, independent of m. 

We now need to compute R{9*) for the quantized updated. It is also straightforward to see that this 
quantity is given by: 



e-1 



R{e*) = KiJ2 rl{G^{F{9*),a* - Sk) - Gm.{F{0*),a* - Sk+i)) - 0" 



k=-l 



Putting everything together we obtain the asymptotic variance estimate for the more general quantizer 
converges to: 
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R{9* 



2K„ 



dG,„ i(r,9') 
dr 



r=a' 



Px{e*) - 1 



i/27ra*(l-a*) 

Set a gain K = ^■^^^k=-ii'^'k^m{sk,Sk+i) the final expression for the variance: 

-y/27ro*(l — «*) 



27r 



Y.V-A^Gm{sk,Sk+i) - P"^ \K'^a*{l-a*) 1 



2 



2Kpx{9*) - Im 



Where AGm{sk, Sk+i) = Gm{c(*,a* — Sfc) — Gm{a*,a* — Sfc+i)- The constant K{a* , Qi) defines the 
performance of the algorithm for different quantization choices: 

. Ei=-irlAGmisk,Sk+i)-(3' 
K[a , Qi) = 2-K— -2— 

(Efc=-£^fc^m(Sfc,Sfc+l)) 

The rate with respect to m is the same, independent of quantization. It is clear from previous analysis 
that if the best quantizers are chosen 1 < Qi) < Obviously K{a* , Qi) over the class of optimal 

quantizers is a decreasing function of £. 
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